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Abstract 

We  present  a  novel  approach  to  learn  distance  metric 
for  information  retrieval.  Learning  distance  metric  from 
a  number  of  queries  with  side  information,  i.e.,  relevance 
judgements,  has  been  studied  widely,  for  example  pairwise 
constraint-based  distance  metric  learning.  However,  the  ca¬ 
pacity  of  existing  algorithms  is  limited,  because  they  usu¬ 
ally  assume  that  the  distance  between  tw’o  similar  objects 
is  smaller  than  the  distance  between  two  dissimilar  ob¬ 
jects.  This  assumption  may  not  hold,  especially  in  the  case 
of  information  retrieval  when  the  input  space  is  heteroge¬ 
neous.  To  address  this  problem  explicitly,  we  propose  rank- 
based  distance  metric  learning.  Our  approach  overcomes 
the  drawback  of  existing  algorithms  by  comparing  the  dis¬ 
tances  only  among  the  relevant  and  irrelevant  objects  for  a 
given  query.  To  avoid  over-fitting,  a  regularizer  based  on 
the  Burg  matrix  divergence  is  also  introduced.  We  apply 
the  proposed  framework  to  tattoo  image  retrieval  in  foren¬ 
sics  and  law  enforcement  application  domain.  The  goal  of 
the  application  is  to  retrieve  tattoo  images  from  a  gallery 
database  that  are  visually  similar  to  a  tattoo  found  on  a 
suspect  or  a  victim.  The  experimental  results  show  encour¬ 
aging  results  in  comparison  to  the  standard  approaches  for 
distance  metric  learning. 


1.  Introduction 

Due  to  rapid  growth  in  the  number  of  available  digital 
images,  content-based  image  retrieval  (CBIR)  has  been  ex¬ 
tensively  studied  over  the  past  decade.  Most  CBIR  systems 
use  low-level  image  features,  such  as  color,  texture,  and 
shape,  to  represent  the  visual  content.  These  features  are  au¬ 
tomatically  extracted  from  images  to  compute  the  similar¬ 
ity  between  a  query  and  images  in  the  database  [7,  17,  25], 
However,  the  retrieval  performances  of  most  CBIR  systems 
do  not  currently  meet  user  expectations.  The  major  rea- 
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son  for  this  limited  performance  is  that  the  low-level  fea¬ 
tures  are  not  able  to  capture  the  perceived  image  similar¬ 
ity  observed  by  humans.  Consequently,  one  of  the  major 
challenges  in  CBIR  is  how  to  compensate  for  the  semantic 
gap  using  the  low-level  features.  Several  different  similarity 
functions  using  low-level  features  have  been  proposed  and 
examined  [12, 19,  28],  Nevertheless,  as  Santini  et  al.  argued 
in  [20],  the  only  perceptual  similarity  that  can  meaningfully 
be  used  is  pre-attentive  similarity,  not  semantic  similarity. 

While  most  CBIR  applications  emphasize  identifying  se¬ 
mantically  similar  images,  such  as  “vacation  images”,  there 
is  increasing  interest  in  retrieving  visually  similar  images, 
such  as  “different  images  of  the  White  House”.  The  con¬ 
cept  of  visual  similarity  is  crucial  in  many  real  applications 
like  “tattoo  image  retrieval”  for  suspect  or  victim  identifi¬ 
cation  [  1 5]  that  plays  an  important  role  in  forensic  and  law 
enforcement.  Because  these  applications  aim  to  retrieve  dif¬ 
ferent  images  of  the  same  object  (e.g.,  tattoo),  semantic  per¬ 
ception  does  not  play  a  major  role  in  retrieval.  This  funda¬ 
mental  difference  makes  it  more  feasible  to  retrieve  visually 
similar  images  based  only  on  the  low-level  visual  features. 

The  key  to  measure  accurate  visual  similarity  between 
images  is  to  find  appropriate  distance  metric  for  the  given 
CBIR  task.  While  most  existing  studies  use  a  pre-defined 
distance  metric  for  image  similarity  measurement,  our  goal 
is  to  learn  a  distance  metric  from  a  number  of  training  sam¬ 
ples  with  side  information  i.e.,  relevance  judgments.  This 
approach  can  be  cast  into  a  standard  distance  metric  learn¬ 
ing  problem,  in  which  a  distance  metric  is  found  to  keep 
the  queries  close  to  the  relevant  objects  and  far  away  from 
the  irrelevant  ones.  Unfortunately,  as  revealed  by  our  em¬ 
pirical  study,  this  strategy  does  not  work  well  for  informa¬ 
tion  retrieval.  This  is  because  most  distance  metric  learn¬ 
ing  algorithms  assume  that  two  similar  objects  are  separated 
by  a  smaller  distance  than  two  dissimilar  objects.  This  as¬ 
sumption  may  not  hold  for  information  retrieval,  especially 
when  some  queries  are  far  away  from  all  the  objects  in  the 
database  while  others  are  close  to  many  of  the  objects  in  the 
database.  In  these  cases,  the  distance  from  a  relevant  object 
to  a  “far  away”  query  may  be  larger  than  a  distance  between 
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an  irrelevant  object  and  a  “close  by”  query.  We  aim  to  ad¬ 
dress  this  problem  by  a  rank-based  distance  metric  learning. 
It  overcomes  the  shortcoming  of  the  existing  algorithms  by 
comparing  the  distance  among  the  relevant  and  irrelevant 
objects  of  only  a  given  query.  A  specially  designed  regular- 
izer  based  on  the  Burg  matrix  divergence  [  1 3]  is  introduced 
to  alleviate  the  over-fitting  problem. 

The  rest  of  the  paper  is  organized  as  follows.  Section 
2  describes  related  work  and  Section  3  presents  the  rank- 
based  approach  for  distance  metric  learning  within  the  con¬ 
text  of  image  retrieval.  Tattoo  image  retrieval  for  suspect 
or  victim  identification  is  described  in  Section  4  as  the  ap¬ 
plication  domain  and  experimental  results  are  provided  in 
Section  5.  Finally,  we  conclude  our  work  in  Section  6. 

2.  Related  Work 

Learning  distance  metric  from  available  side  information 
has  attracted  much  interest  in  recent  studies.  The  side  infor¬ 
mation  is  usually  cast  in  the  form  of  pairwise  constraints. 
The  must-link  (or  equivalence)  constraints  are  the  pairs  of 
“similar”  objects,  and  cannot-link  (or  inequivalence)  con¬ 
straints  are  the  pairs  of  “dissimilar”  objects.  The  opti¬ 
mal  distance  metric  is  found  such  that  the  objects  in  must- 
link  constraints  are  close  to  each  other  while  the  objects 
in  the  cannot-link  constraints  are  well  separated.  A  num¬ 
ber  of  algorithms  have  been  developed  for  learning  dis¬ 
tance  metric  from  pairwise  constraints,  including  the  con¬ 
vex  programming  approach  [27,  16],  local  distance  metric 
learning  [11,  30],  relevance  component  analysis  [4],  dis¬ 
criminative  component  analysis  (DCA)  [14],  support  vector 
machine  based  approaches  [21],  neighborhood  component 
analysis  [9]  and  its  extension  [8],  maximum-margin  nearest 
neighbor  (LMNN)  classifier  [26],  a  boosting  approach  [31] 
and  Bayesian  distance  metric  learning  [29], 

Most  of  the  algorithms  for  distance  metric  learning  as¬ 
sume  that  the  objects  in  a  must-link  constraint  are  separated 
by  a  smaller  distance  compared  to  the  objects  in  a  cannot- 
link  constraint.  However,  this  assumption  may  not  hold 
if  the  input  space  is  heterogeneous  and  the  distances  be¬ 
tween  objects  vary  significantly  from  one  location  of  the 
input  space  to  another.  As  a  consequence,  it  is  inappropri¬ 
ate  to  directly  compare  the  distance  of  any  must-link  con¬ 
straint  to  the  distance  of  any  cannot-link  constraint.  Our 
proposed  a  rank-based  approach  for  distance  metric  learn¬ 
ing  overcomes  this  shortcoming  by  comparing  the  distance 
of  a  must-link  constraint  to  that  of  a  cannot-link  constraint 
only  when  they  are  from  the  “same  location”  in  the  input 
space  or  associated  with  the  same  query. 

It  is  worth  mentioning  that  in  addition  to  the  paradigm  of 
learning  distance  metric  from  pairwise  constraints,  there  are 
other  approaches  for  distance  metric  learning.  For  instance, 
in  [21]  the  authors  proposed  to  learn  a  distance  metric  from 
relative  comparison.  Although  the  approach  in  [21]  is  sim¬ 


ilar  to  the  spirit  of  this  work,  it  differs  significantly  in  both 
the  overall  formulation  and  the  regularizes  used  to  avoid 
over-fitting.  In  [31,  10]  the  authors  present  a  framework  of 
distance  metric  learning  based  on  maximum  likelihood  es¬ 
timation. 

3.  Distance  Metric  Learning 

The  standard  distance  metric  learning  involves  pairs  of 
objects  that  are  randomly  sampled  from  a  database.  On  the 
other  hand,  in  CBIR  the  pairwise  constraints  are  generated 
by  issuing  queries  against  a  given  database  of  images,  and 
visually  identifying  images  from  the  top  retrieved  ones  that 
are  similar  to  the  query  images. 

Let  V  =  {xi,  i  =  1, . . . ,  Njj}  denote  the  collection  of 
images  to  be  retrieved  where  Xi  £  is  a  feature  vector 
of  size  d  and  represents  the  ?'th  image.  Let  Q  =  { q, ,  i  = 

1. .  . . ,  Nq}  denote  the  set  of  queries  that  are  used  to  gen¬ 
erate  the  pairwise  constraints  for  distance  metric  learning. 
Similar  to  the  images  in  V ,  each  query  image  q,  is  repre¬ 
sented  by  a  vector  of  d  attributes.  For  each  query  qt,  we  de¬ 
note  by  {xjj , . . . ,  XiK  }  the  top  K  images  that  are  retrieved 
from  V  by  the  given  distance  metric  Aq.  We  denote  by 
yt  £  {—1,+1}  the  relevance  judgment  for  the  j-th  re¬ 
trieved  image  Xij :  yl  =  +1  when  the  retrieved  image  xt 
is  visually  similar  to  the  query  image  q,j ,  and  —1  otherwise. 
Using  the  language  of  pairwise  constraints,  image  xi;  and 
query  q,  form  a  must-link  constraint  when  yt  =  1  and  a 
cannot-link  constraint  when  yi .  =  —1.  Our  goal  is  to  learn 
a  distance  metric  A  £  R.dxd  from  the  generated  pairwise 
constraints  that  improves  over  the  existing  metric  A  (l . 

3.1.  Constraint-based  Distance  Metric  Learning 

Before  presenting  the  rank-based  approach  for  distance 
metric  learning,  we  first  present  a  “typical”  distance  metric 
learning  approach  for  image  retrieval.  The  approach  ex¬ 
ploits  the  assumption  that  the  distance  between  images  in 
a  must-link  constraint  tends  to  be  smaller  than  that  for  a 
cannot-link  constraint.  We  refer  to  this  typical  approach 
as  “ constraint-based ”  for  distance  metric  learning  to  distin¬ 
guish  it  from  the  proposed  “ rank-based ”  approach. 

Following  the  framework  in  [27],  the  optimal  distance 
metric  is  learned  by  minimizing  the  overall  distance  of 
the  must-link  constraints  provided  that  the  images  in  the 
cannot-link  constraints  are  well  separated.  This  principle 
can  be  cast  into  the  following  optimization  problem: 

Nq  K  ^ 

A  EE  5{yij,+l)d(qi,xi:j-,A)  +  -tr  (AAT) 

i= 1  j= 1 

s.  t.  d(qi ,  Xi3 ,,  A)  1,  Vyt.  1 

^0,  (1) 

where  5(y,  a)  is  a  Dirac  delta  function  that  outputs  1  when 


y  =  a  and  zero  otherwise.  d(x,x'\A)  measures  the  dis¬ 
tance  between  images  x  and  x'  based  on  the  metric  A,  and 
is  defined  as 

d(x,  x';  A)  =  (x  —  x')T A{x  —  x').  (2) 


There  are  two  sets  of  constraints  used  in  the  above  optimiza¬ 
tion  problem.  The  first  set  of  constraints,  d(qi,Xij  \  A)  > 
1 .  VyI(  =  — 1,  ensures  the  pairs  of  images  in  the  cannot- 
link  constraints  are  well  separated.  The  second  constraint, 
A  y  0,  ensures  that  matrix  A  is  indeed  a  metric.  The  ob¬ 
jective  function  in  (1)  consists  of  two  terms.  The  first  term, 
i.e.,  EiIQi  i  s(yij  >  +l)d(qt,  Xi. ;  A),  measures  the  sum 
of  the  distance  over  all  the  must-link  constraints.  By  min¬ 
imizing  this  term,  we  enforce  the  images  in  the  must-link 
constraints  to  be  close  to  each  other.  The  second  term  in 
the  objective  function,  i.e.,  \tr(AAT) /2,  is  introduced  to 
regularize  the  optimal  solution  for  metric  A  to  be  a  sparse 
matrix.  This  is  similar  to  the  quadratic  regularizer  used  in 
support  vector  machine  (SVM)  [5].  Finally,  the  above  prob¬ 
lem  is  a  Semi-Definite  Programming  (SDP)  problem  and 
can  in  general  be  solved  by  an  interior  point  method  [24], 

The  main  shortcoming  of  the  constraint-based  approach 
is  that  the  distance  between  objects  in  must-link  constraints 
may  vary  significantly  from  one  query  to  another.  As  a  re¬ 
sult,  the  sum  of  the  distance  for  all  the  must-link  constraints 
may  be  dominated  by  a  small  number  of  queries  that  are  in¬ 
deed  very  far  from  the  images  in  the  database  V,  and  most 
of  the  optimization  effort  is  spent  on  reducing  the  distance 
for  these  far  away  queries.  According  to  the  representer  the¬ 
orem,  the  optimal  solution  A*  to  the  optimization  problem 
in  (1)  can  be  written  as: 

ND  K 

A*  =  xi:j)(qi  -  xi:j)T 

*= 1  3= 1 
N  jj  K 

+v  EE  -  xi:j)T(3) 

*=  1  3=1 

where  0,;  and  r/  are  weights  assigned  to  each  pairwise  con¬ 
straint.  As  indicated  by  the  above  theorem,  every  must-link 
constraint  (i.e.,  yt  =  +1)  is  assigned  the  same  weight  p. 
As  a  consequence,  the  optimal  metric  A*  may  be  dominated 
by  the  far  away  queries. 

One  may  consider  improving  the  above  approach  by 
viewing  the  problem  of  distance  metric  learning  as  a  bi¬ 
nary  classification  problem,  and  cast  it  into  the  following 
optimization  problem: 


min 

AeRdxd 


s.  t. 


NQ  K  , 

EE  $(yij  >  +l)ei3-  +  —  tr(AAT) 

i=  1  3=1 

d{qi,xi}\A)>  1,  Vt/i3.  =  - 1 

d{qu  xit ;  A)  <  1  +  ei3 ,  eij  >  0,  =  +1 

A  t  0,  (4) 


where  slack  variables  e* .  >  0  are  introduced  to  account  for 
the  errors  in  classifying  images  to  be  similar.  Similar  to 
the  previous  analysis,  we  have  a  representer  theorem  for  the 
optimal  solution  A*,  i.e., 

Nd  K 

A*  =  ^EM?i"Iij)fe"j:ii)T'  (5) 

i= 1  3=1 

Note  that  the  weights  assigned  to  must-link  constraints  by 
the  above  optimization  problem  are  no  longer  a  single  pa¬ 
rameter  as  in  (3).  However,  the  following  theorem  illus¬ 
trates  that  the  formulation  in  (4)  indeed  puts  more  emphasis 
on  the  distances  associated  with  the  far  away  queries. 

Theorem  1  The  problem  in  (4)  is  equivalent  to  the  follow¬ 
ing  optimization  problem: 

Nq  K  \ 

min d  +  otr(AAT) 

i=l  j=l 

s.  t.  d(qi,xij\A)>  1,  =  - 1 

A  t  0,  (6) 

where  1(d)  =  max(0,  d  —  1). 

The  above  result  follows  the  fact  = 

max(0,  d(qi,  x^. ;  A)  —  1).  As  indicated  by  the  above 
theorem,  the  loss  function  1(d)  removes  any  must-link 
constraint  whose  distance  d  is  less  than  1,  and  as  a  result, 
the  impact  of  far  away  queries  is  further  amplified  by  1(d). 

3.2.  Rank-based  Distance  Metric  Learning 

To  address  the  problem  when  the  input  space  is  hetero¬ 
geneous  and  the  distance  in  must-link  constraints  may  vary 
significantly  from  one  query  to  another,  we  propose  to  learn 
the  distance  metric  learning  by  a  rank-based  approach.  In 
particular,  instead  of  requiring  the  distance  of  any  must-link 
constraint  to  be  smaller  than  that  of  a  cannot-link  constraint, 
we  only  compare  the  distances  of  pairwise  constraints  that 
are  generated  by  the  same  query.  Hence,  a  must-link  con¬ 
straint  is  supposed  to  have  a  smaller  distance  than  a  cannot- 
link  constraint  only  when  they  are  from  the  same  query.  We 
cast  this  idea  into  the  following  optimization  problem: 

nq  k  . 

E  E  %iJ->-1)5(2/4>+1)e},fc+  2ir(AAT) 

i=l  k,j=l 

d(qi, Xy ;  A)  -  d(qi,xik;A )  >  1  >  0 

A  h  0  (7) 

Note  that  a  slack  variable  £*•  k  >  0  is  introduced  when  com¬ 
paring  a  must-link  constraint  (i.e.,  yik  =  +1)  and  a  cannot- 
link  constraint  (i.e.,  yik  =  —1)  that  share  the  same  query. 
Since  only  the  constraints  sharing  the  same  query  will  be 


min 

s.  t. 


compared  in  computing  the  distance  metric,  we  only  re¬ 
quire  the  distance  of  a  must-link  constraint  to  be  relatively 
small  compared  to  the  distance  of  a  cannot-link  constraint 
and  therefore  avoid  the  shortcoming  of  the  constraint-based 
approach  for  distance  metric  learning. 

Although  the  formulation  in  (7)  addresses  the  shortcom¬ 
ings  of  the  constraint-based  approach,  it  does  not  take  into 
account  the  existing  distance  metric  Aq  when  learning  a 
new  distance  metric  from  pairwise  constraints.  This  could 
be  important  if  Aq  is  engineered  by  the  domain  expert  to 
take  into  account  the  domain  knowledge.  It  will  also  be 
useful  to  take  into  account  Aq  if  we  learn  the  distance  met¬ 
ric  A  in  a  sequential  manner  and  Aq  is  a  distance  metric 
learned  from  the  pairwise  constraints  collected  in  the  previ¬ 
ous  iterations.  In  order  to  explicitly  take  into  account  Aq, 
we  replace  the  regularizer  Atr(AAT) /2  with  the  Burg  ma¬ 
trix  divergence  [13]  that  is  defined  as  follows: 

D(A,A0)  =  tr(AAg  1(AAq  1)T)  —  21ogdet(AAg  7)  —  d.  (8) 


Since  A  and  Aq  may  share  a  different  scaling,  we  normalize 
matrix  Aq  as  follows  before  computing  the  divergence. 


-Ao  —  A0 


tr  {A) 

tr(Ao)  ‘ 


Using  the  above  matrix  divergence,  the  problem  in  (7)  is 
modified  as  follows: 

nq  k  . 

min  Y  S(yii’~1)6(yik,+1)£j,k+  ~d(A,A0) 
Ae Rdxd  '  J}  2 

2  =  1  k,j  =  1 

s.  t.  d(qi,xi:j;A)  -  d(qi,xik;A)  >  1  -  e£fe,e*-ifc  >  0 

A  t  0.  (9) 


By  minimizing  the  divergence  between  A  and  Aq,  we  re¬ 
quire  the  learned  distance  matrix  A  to  be  similar  to  Aq. 


where  a  =  JL=1  ai/d.  The  above  analysis  indicates 
that  when  Aq  is  an  Identity  matrix,  the  matrix  divergence 
D(A,  A (j )  essentially  measures  the  variance  in  the  diagonal 
elements  of  matrix  A.  Thus,  by  minimizing  the  divergence, 
the  resulting  matrix  A  tends  to  have  a  flat  distribution  over 
its  diagonal  elements. 

3.3.  Efficient  Implementation 

The  distance  metric  learning  algorithm  described  above 
requires  finding  the  optimal  matrix  A.  This  is  usually 
computationally  expensive  because  (i)  the  number  of  ele¬ 
ments  in  A  is  quadratic  in  the  number  of  dimensions  used 
to  represent  images,  and  (ii)  the  requirement  that  A  has 
to  be  positive  semi-definite.  We  reduce  the  computational 
cost  by  assuming  A  to  be  a  diagonal  matrix,  i.e.,  A  = 
diag(ai, . . . ,  ad),  such  that  d(x,  x'\  A)  =  d(x,x';a )  = 
i(xt  —  a4)2ctj.  Then,  the  problems  in  (1)  and  (7)  are 
simplified  as 


Nq  K  ^  d 

min,  ;a)  +  -  V  a2 

a£WLd  z z '  2  z ' 

i= 1 3=1  *=1 

s.  t.  d{qi,xij\  A)  >  1,  Vy»3.  =  - 1 

a%  >  0,  i  =  1, . . . ,  d  (10) 


and 


Nq  K  ^  d 

min  Y  5(yii’~1)6(yik’+iyj,k  +  2  ~  ^ 

2=1  k,j  =  1  2=1 

s.  t.  d{qi,xij\a)  -  d(qi,xik;A)  >  1  -  £%k,e),k  >  0 

at  >  0,i  =  1, . . .  ,d,  (11) 


Remark  To  better  understand  the  matrix  divergence 
D(A,  Aq)  in  (8),  we  consider  the  special  case  when  both 
A  and  Aq  are  diagonal  matrices,  i.e.,  A  =  diag(ai, . . . ,  ad) 
and  Aq  =  diag(hi, . . . ,  bj).  The  divergence  is  now  simpli¬ 
fied  as  follows: 

d  2  d 

D(A,A0)  =  )  -  2y^log(qt/^)  -  d 

2=1  2=1 
d 

»= r 

where  b,  =  bi  aj/(^f=r  &»)■  The  above  approxima¬ 
tion  follows  the  inequality  logo;  «  x  —  1.  When  Aq  is  an 
Identity  matrix,  the  divergence  I) ( A .  A0)  is  further  approx¬ 
imated  as 

_  ^  d , 

D(A,  A0)  «  z^V'K-a)2, 

a 


respectively.  In  the  above,  an  Identity  matrix  is  assumed 
for  Aq.  Both  problems  in  (10)  and  (11)  can  be  solved  by 
standard  quadratic  programming  techniques. 

Remark  It  is  interesting  to  examine  the  regularizer 
Yhi=\(ai  ~  a)2  from  the  view  point  of  Laplacian.  We  can 
rewrite  the  regularizer  into  the  matrix  form,  i.e., 

d 

—  a)2  =  a1  (I  —  llT/n)a  =  a1  La 

i= 1 

where  L  is  indeed  a  graph  Laplacian  constructed  from  a 
fully  connected  graph  with  every  edge  weighted  equally.  If 
we  have  more  knowledge  regarding  the  features,  we  can 
adopt  a  different  weight  for  the  pairwise  relationship  be¬ 
tween  any  two  features,  which  will  lead  to  a  very  different 
graph  Laplacain. 


(»)  <b)  (0  (d) 


Figure  1.  Examples  of  tattoos  belonging  to  well  known  gangs: 
(a)  Brazers,  (b)  Latin  Kings,  (c)  Family  Stones,  and  (d)  Insane 
Deuces  [1] 


Figure  2.  Illustration  of  large  intra-class  variability  in  tattoo  im¬ 
ages.  All  the  above  images  belong  to  the  FIRE  category 


4.  Tattoo  Images  for  Victim  and  Suspect  Iden¬ 
tification 

Tattoos  engraved  on  human  body  are  routinely  used  to 
assist  in  human  identification  in  forensics  applications.  This 
is  not  only  because  of  the  increasing  prevalence  of  tattoos, 
but  also  due  to  their  impact  on  other  methods  of  human 
identification  such  as  visual,  pathological,  or  trauma-based 
identification  [22],  The  role  of  tattoos  is  particularly  impor¬ 
tant  when  the  primary  biometric  traits,  e.g.,  fingerprints  or 
face,  are  either  no  longer  available,  or  corrupted  (e.g.  vic¬ 
tims  of  Asian  Tsunami  and  9/1 1  terrorist  attack).  A  study 
by  Burma  [6]  found  that  delinquents  are  significantly  more 
likely  to  have  tattoos  than  non-delinquents  which  indicates 
that  tattoos  could  provide  a  source  of  information  for  de¬ 
termining  gang  membership.  Many  law  enforcement  agen¬ 
cies  maintain  a  database  of  tattoos,  i.e.,  tattoo  held  in  the 
Computerized  Criminal  History  Records,  and  it  is  now  a 
common  practice  to  photograph  and  catalog  tattoo  patterns 
to  identify  victims  and  criminals  (e.g.,  gang  membership, 
see  Figure  1)  [23,  1],  While  a  tattoo  does  not  uniquely 
establish  the  identity  of  a  suspect  or  a  victim,  it  helps  in 
narrowing  down  the  possible  identities  since  tattood  often 
indicate  gang  membership,  religious  beliefs,  previous  con¬ 
viction,  military  services,  etc. 

The  ANSI/NIST-ITL  1-2000  document  [3]  contains  clas¬ 
sification  standards  for  tattoo  images.  The  standard  has 
eight  major  tattoo  classes,  such  as  human,  animal,  symbol, 
etc,  and  80  subclasses.  Current  practice  in  law  enforcement 
agencies  is  to  match  a  query  tattoo  by  performing  manual 
searches  in  the  tattoo  database  based  on  matching  the  class 
labels.  This  process  is  subjective,  has  limited  performance 
and  is  time-consuming.  Further,  a  simple  class  descriptor 
of  a  tattoo  textual  query  does  not  contain  all  the  semantics 
in  the  tattoo  images  as  evident  by  the  large  intra-class  vari¬ 


ability  (see  Figure  2). 

Jain  et  al.  proposed  a  CBIR  system  for  tattoo  image 
matching  and  retrieval  [15],  Although  this  system  showed 
promising  results,  its  performance  is  limited  because  it  em¬ 
ploys  a  predefined  similarity  measure  without  appropriately 
weighting  different  features.  We  aim  to  improve  its  perfor¬ 
mance  by  applying  the  proposed  rank-based  distance  metric 
learning  framework. 

4.1.  Tattoo  Image  Database 

We  use  the  same  tattoo  database  as  in  [15],  which  con¬ 
tains  2,157  tattoo  images  downloaded  from  the  web  [2]  and 
belonging  to  eight  main  classes  and  20  subclasses  in  the 
ANSI/NIST  standard  [3],  Multiple  acquisition  of  the  same 
tattoo  may  look  different  because  of  various  imaging  condi¬ 
tion,  such  as  brightness,  viewpoint  and  distance  (see  Figure 
3).  A  tattoo  image  retrieval  system  should  be  invariant  to 
these  imagining  conditions.  To  simulate  the  various  imag¬ 
ing  conditions,  we  follow  the  work  in  [15]  and  generate  20 
transformed  images  for  every  tattoo  image  in  the  database 
(see  Figure  4).  This  results  in  a  total  of  43,140  synthesized 
images. 

4.2.  Image  Features 

We  choose  the  low  level  image  attributes  same  as  in  [15], 
i.e.,  color,  shape  and  texture.  The  overall  size  of  the  feature 
vector  is  272.  Similar  features  have  also  been  used  in  many 
other  CBIR  systems  and  summarized  below. 

Color  Two  color  descriptors,  color  histogram  and  color 
correlogram,  are  extracted  from  the  RGB  space.  A  color 
correlogram  stores  the  probability  of  finding  a  pixel  of  color 
j  at  a  distance  k  from  a  pixel  of  color  i  in  the  image.  The 
color  histogram  and  correlogram  are  calculated  by  dividing 
each  color  component  into  20  and  63  bins,  resulting  in  a 
total  of  60  and  189  bins  for  the  color  histogram  and  correl¬ 
ogram,  respectively.  For  computational  efficiency,  we  com¬ 
pute  color  autocorrelogram  only  between  identical  colors  in 
a  local  neighborhood,  i.e.,  i  =  j  and  k  =  1,  3,  5. 

Shape  Based  on  2nd  and  3rd  order  moments,  a  set  of 
seven  features  that  are  invariant  to  translation,  rotation,  and 
scale  are  obtained.  Two  different  feature  sets  are  extracted, 
one  from  the  segmented  grayscale  and  the  other  from  gra¬ 
dient  tattoo  images. 

Texture  Edge  Direction  Coherence  Vector  stores  the  ra¬ 
tio  of  coherent  to  non-coherent  edge  pixels  with  the  same 
quantized  direction  (within  an  interval  of  10  degree).  A 
threshold  (0.1%  of  image  size)  on  the  edge-connected  com¬ 
ponents  in  a  given  direction  is  used  to  decide  the  region  co¬ 
herency.  This  feature  discriminates  structured  edges  from 
randomly  distributed  edges. 

The  histogram  intersection  based  approach  used  in  [  1 5] 
to  measure  image  similarity,  is  used  here  as  the  baseline 


Figure  3.  Eight  different  images  of  a  butterfly  tattoo  taken  under  different  imaging  conditions 


to  (b)  (c)  (d)  to  If)  (g)  (h) 


Figure  4.  Examples  of  tattoo  image  transformation:  (a)  original,  variations  due  to  (b)  blurring,  (c)  and  (d)  aspect  ratio  change,  (e)  illumina¬ 
tion,  (f)  additive  noise  (g)  color  transformation,  and  (h)  rotation 


performance.  This  similarity  measure  calculates  the  over¬ 
lapping  area  between  two  normalized  histograms. 

5.  Experimental  Results 

We  evaluate  the  proposed  algorithm  for  distance  metric 
learning  on  tattoo  image  retrieval  problem.  We  assume  that 
the  query  tattoo  images  are  taken  under  imperfect  imag¬ 
ing  conditions  and  therefore  can  be  simulated  by  the  trans¬ 
formed  images  that  were  described  in  Section  4.  A  retrieved 
image  is  deemed  to  be  relevant,  when  the  query  image  was 
generated  from  the  retrieved  image,  by  one  of  the  image 
transformations  shown  in  Figure  4.  The  number  of  queries 
is  43,140  and  the  size  of  the  database  is  2,157.  The  distance 
metric  is  learned  off-line  from  a  pool  of  training  examples 
and,  as  a  result,  the  matching  procedure  using  the  learned 
distance  metric  takes  the  same  time  as  the  baseline. 

Since  there  is  only  one  true  “similar”  image  in  the 
database  for  every  query  image,  we  adopt  the  cumulative 
matching  characteristic  (CMC)  [  1 8]  curve  as  the  evaluation 
metric.  This  metric  cumulates  the  correct  number  of  re¬ 
trieved  images  as  the  rank  is  increased.  For  cross  validation, 
we  divided  the  database  of  query  images  (43,140  images) 
into  ten  folds  of  equal  size.  One  fold  of  query  images  is  se¬ 
lected  for  testing,  and  5,000  images  are  randomly  selected 
from  the  remaining  nine  folds  for  training.  This  procedure 
is  repeated  for  every  fold  of  query  images  and  the  CMC 
curve,  averaged  over  10  experiments,  is  reported  with  the 
mean  of  standard  deviations,  er,  of  all  ranks. 

Before  presenting  our  results  on  rank-based  distance 
learning,  we  will  first  examine  the  hypothesis  that  is  used 
by  many  other  distance  metric  learning  algorithms,  namely 
a  distance  between  two  similar  object  in  a  must-link  pair 
is  usually  smaller  than  the  distance  between  two  dissimilar 
objects  in  a  cannot-link  pair.  Figure  5  shows  the  distance 


distributions  based  on  histogram  intersection  for  both  must- 
link  pairs  and  cannot-link  pairs.  We  notice  that  the  distance 
distribution  for  the  must-link  pairs  indeed  has  a  long  tail, 
which  makes  it  difficult  to  differentiate  them  from  cannot- 
link  pairs.  This  suggests  that  the  hypothesis  assumed  by 
many  distance  metric  learning  algorithms  may  not  hold  in 
our  image  retrieval  problem.  In  this  experimental  study,  we 
aim  to  address  three  important  questions: 

•  Will  the  rank-based  framework  be  more  effective  than 
the  constraint-based  framework  for  distance  metric 
learning  in  the  case  of  image  retrieval? 

•  How  important  is  the  regularizer  in  learning  a  distance 
metric  for  image  retrieval? 

•  How  to  efficiently  train  a  distance  metric  by  the  rank- 
based  framework? 

Comparison  of  Distance  Metric  Learning  Algorithms 

We  now  compare  the  rank-based  approach  for  distance  met¬ 
ric  learning  to  the  constraint-based  approach.  Figure  6 
shows  the  retrieval  performance  of  the  two  distance  metric 
learning  approaches.  First,  we  observe  that  the  rank-based 
approach  significantly  outperforms  the  constraint-based  ap¬ 
proach  at  every  rank.  For  instance,  the  rank-1  retrieval 
accuracy  of  the  ranked-based  approach  is  over  71%  while 
the  accuracy  of  the  constraint-based  approach  is  less  than 
65%.  Besides,  the  constraint-based  approach  shows  very 
little  improvement  over  the  baseline.  In  fact,  it  performs 
noticeably  worse  than  the  baseline  for  the  first  5  ranks.  This 
result  implies  that  directly  comparing  the  distance  of  any 
must-link  constraint  to  that  of  any  cannot-link  constraint 
may  be  inappropriate  if  the  input  space  is  heterogeneous. 
Overall,  we  observe  a  significant  improvement  made  by  the 
ranked-based  approach  for  distance  metric  learning  in  com¬ 
parison  to  the  baseline  approach,  suggesting  that  the  pro- 
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Figure  5.  Distance  distributions  for  must-link  and  cannot-link  pairs 


Figure  6.  Retrieval  accuracy  of  the  rank-based  approach  and  the 
constraint-based  approach  for  distance  metric  learning 

posed  ranked-based  approach  is  effective  in  handling  het¬ 
erogeneous  input  space. 


Effects  of  Regularizer  This  experiment  examines  the  ef¬ 
fect  of  the  regularizer  in  (10)  by  varying  the  value  of  the 
regularization  parameter  A.  Figure  7  summarizes  the  re¬ 
trieval  performance  of  the  rank-based  approach  with  differ¬ 
ent  value  of  A.  Without  a  regularizer,  i.e.,  A  =  0  in  (10),  the 
retrieval  performance  of  the  rank-based  approach  is  similar 
to  baseline.  By  increasing  the  value  of  regularization  pa¬ 
rameter  from  1  to  100,  we  observe  the  overall  increase  in 
the  retrieval  performance.  These  results  indicate  the  impor¬ 
tance  of  regularizer  for  distance  metric  learning.  Also,  the 
overall  monotonic  trend  with  increasing  value  of  A  makes 
it  relatively  easy  to  choose  the  appropriate  value  for  A.  In 
fact,  the  retrieval  performance  remains  almost  unchanged 
when  the  regularization  parameter  passes  a  certain  thresh¬ 
old.  We  found  that  the  threshold  value  for  the  regularization 
parameter  depends  the  size  of  training  set.  In  particular,  we 
observed  a  larger  value  for  the  threshold  of  the  parameter 
when  the  size  of  training  example  is  increased. 


Rank 

Figure  7.  Retrieval  accuracy  of  rank-based  approach  using  differ¬ 
ent  regularization  parameter  values 


Figure  8.  Retrieval  accuracy  of  rank-based  approach  for  distance 
metric  learning  using  different  training  pairs 

Efficient  Training  for  Distance  Metric  Learning  There 
are  a  total  of  93  million  image  pairs  in  our  experiments.  It  is 
thus  computationally  infeasible  to  use  all  the  pairs  for  train¬ 
ing.  Instead,  we  focus  on  training  the  distance  metric  by 
selecting  “critical”  image  pairs.  The  critical  image  pairs  for 
each  query  image  are  formed  by  the  top  list  of  irrelevant  im¬ 
ages  that  are  retrieved  by  the  baseline  approach.  In  addition, 
to  preserve  the  diversity  of  the  training  pairs,  we  also  ran¬ 
domly  select  a  few  images  for  each  query  to  form  additional 
cannot-links.  Figure  8  shows  the  results  of  the  rank-based 
approach  that  is  trained  by  two  different  sets  of  pairs:  (i)  the 
critical  pairs  formed  by  the  top  ranked  20  irrelevant  images, 
and  (ii)  the  critical  pairs  formed  by  the  top  ranked  10  irrel¬ 
evant  images  and  10  randomly  selected  images.  The  results 
show  that  although  the  same  number  of  cannot-link  sets  are 
used  in  both  the  experiments,  the  distance  metric  trained 
from  the  combination  of  top  ranked  images  and  randomly 
chosen  images  performs  much  better.  We  attribute  the  dif¬ 
ference  to  the  fact  that  the  top  ranked  irrelevant  images  may 
not  be  able  to  represent  the  feature  distribution  of  images 


in  the  entire  database.  The  randomly  chosen  images  from 
outside  of  top  ranked  images  provide  general  information 
about  the  input  space  while  the  top  rank  images  supply  de¬ 
tailed  information  only  among  a  given  query  and  irrelevant 
images. 

6.  Conclusions 

In  this  paper,  we  examined  the  problem  of  distance  met¬ 
ric  learning  under  the  context  of  image  retrieval.  We  pre¬ 
sented  a  rank-based  framework  for  distance  metric  learn¬ 
ing  that  explicitly  addresses  the  problem  of  heterogeneous 
input  space.  Our  approach  distinguishes  from  the  previ¬ 
ous  approach,  e.g.,  pairwise  constraint-based  distance  met¬ 
ric  learning,  in  that  it  does  not  assumes  shorter  distances 
among  relevant  objects  compared  to  the  distance  between 
objects.  The  experimental  results  show  that  our  approach  is 
more  effective  than  the  existing  algorithms. 
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